75 research outputs found
Stable Feature Selection for Biomarker Discovery
Feature selection techniques have been used as the workhorse in biomarker
discovery applications for a long time. Surprisingly, the stability of feature
selection with respect to sampling variations has long been under-considered.
It is only until recently that this issue has received more and more attention.
In this article, we review existing stable feature selection methods for
biomarker discovery using a generic hierarchal framework. We have two
objectives: (1) providing an overview on this new yet fast growing topic for a
convenient reference; (2) categorizing existing methods under an expandable
framework for future research and development
A Combinatorial Perspective of the Protein Inference Problem
In a shotgun proteomics experiment, proteins are the most biologically
meaningful output. The success of proteomics studies depends on the ability to
accurately and efficiently identify proteins. Many methods have been proposed
to facilitate the identification of proteins from the results of peptide
identification. However, the relationship between protein identification and
peptide identification has not been thoroughly explained before.
In this paper, we are devoted to a combinatorial perspective of the protein
inference problem. We employ combinatorial mathematics to calculate the
conditional protein probabilities (Protein probability means the probability
that a protein is correctly identified) under three assumptions, which lead to
a lower bound, an upper bound and an empirical estimation of protein
probabilities, respectively. The combinatorial perspective enables us to obtain
a closed-form formulation for protein inference.
Based on our model, we study the impact of unique peptides and degenerate
peptides on protein probabilities. Here, degenerate peptides are peptides
shared by at least two proteins. Meanwhile, we also study the relationship of
our model with other methods such as ProteinProphet. A probability confidence
interval can be calculated and used together with probability to filter the
protein identification result. Our method achieves competitive results with
ProteinProphet in a more efficient manner in the experiment based on two
datasets of standard protein mixtures and two datasets of real samples.
We name our program ProteinInfer. Its Java source code is available at
http://bioinformatics.ust.hk/proteininfe
- …